Necessity of Feature Selection when Augmenting Tweet Sentiment Feature Spaces with Emoticons
نویسندگان
چکیده
Tweet sentiment classification seeks to identify the emotional polarity of a tweet. One potential way to enhance classification performance is to include emoticons as features. Emoticons are representations of faces expressing various emotions in text. They are created through combinations of letters, punctuation marks and symbols, and are frequently found within tweets. While emoticons have been used as features for sentiment classification, the importance of their inclusion has not been directly measured. In this work, we seek to determine if the addition of emoticon features improves classifier performance. We also investigate how high dimensionality impacts the addition of emoticon features. We conducted experiments testing the impact of using emoticon features, both with and without feature selection. Classifiers are trained using four different learners and either emoticons, unigrams, or both as features. Feature selection was conducted using five filter based feature rankers with four feature subset sizes. Our results showed that the choice of feature set (emoticon, unigram or both) had no significant impact in our initial tests when using no feature selection; however, with any of the tested feature selection techniques, augmenting unigram features with emoticon features resulted in significantly better performance than unigrams alone. Additionally, we investigate how the addition of emoticons changes the top features selected by the rankers.
منابع مشابه
Impact of Feature Selection Techniques for Tweet Sentiment Classification
Sentiment analysis of tweets is a powerful application of mining social media sites that can be used for a variety of social sensing tasks. Common feature engineering techniques frequently result in a large numbers of features being generated to represent tweets. Many of these features may degrade classifier performance and increasing computational cost. Feature selection techniques can be used...
متن کاملMicro-Blog Emotion Classification Method Research Based on Cross-Media Features
Although the sentiment analysis of tweet has caused more and more attention in recent years, most existing methods mainly analyze the text information. Because of the fuzziness of emotion expression, users are more likely to use mixed ways, such as words and image, to express their feelings. This paper proposes a classification method of tweet emotion based on fusion feature, which combines the...
متن کاملTwitterHawk: A Feature Bucket Based Approach to Sentiment Analysis
This paper describes TwitterHawk, a system for sentiment analysis of tweets which participated in the SemEval-2015 Task 10, Subtasks A through D. The system performed competitively, most notably placing 1 in topicbased sentiment classification (Subtask C) and ranking 4 out of 40 in identifying the sentiment of sarcastic tweets. Our submissions in all four subtasks used a supervised learning app...
متن کاملFeature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context
Nowadays, users can share their ideas and opinions with widespread access to the Internet and especially social networks. On the other hand, the analysis of people's feelings and ideas can play a significant role in the decision making of organizations and producers. Hence, sentiment analysis or opinion mining is an important field in natural language processing. One of the most common ways to ...
متن کاملEnhance Polarity Classification on Social Media through Sentiment-based Feature Expansion
Online social networking communities usually exhibit complex collective behaviors. Since emotions play a relevant role in human decision making, understanding how online networks drive human mood states become a task of considerable interest. One of the most relevant task in Sentiment Analysis is Polarity Classification, aimed at classifying the sentiment behind texts. We formulated different a...
متن کامل